NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Matrix Profile in Seismology: Template Matching of Everything With Everything

https://doi.org/10.1029/2023JB027122

Shabikay_Senobari, Nader; Shearer, Peter M; Funning, Gareth J; Zimmerman, Zachary; Zhu, Yan; Brisk, Philip; Keogh, Eamonn (February 2024, Journal of Geophysical Research: Solid Earth)

Abstract Template matching has proven to be an effective method for seismic event detection, but is biased toward identifying events similar to previously known events, and thus is ineffective at discovering events with non‐matching waveforms (e.g., those dissimilar to existing catalog events). In principle, this limitation can be overcome by cross‐correlating every segment (possible template) of a seismogram with every other segment to identify all similar event pairs, but doing so has been previously considered computationally infeasible for long time series. Here we describe a method, called the ‘Matrix Profile’ (MP), a “correlate everything with everything” calculation that can be efficiently and scalably computed. The MP returns the maximum value of the correlation coefficient of every sub‐window of continuous data with every other sub‐window, as well as the best‐correlated sub‐window location. Here we show how MP methods can obtain valuable results when applied to months and years of continuous seismic data in both local and global case studies. We find that the MP can identify many new events in Parkfield, California seismicity that are not contained in existing event catalogs and that it can efficiently find clusters of similar earthquakes in global seismic data. Either used by itself, or as a starting point for subsequent template matching calculations, the MP is likely to provide a useful new tool for seismology research.
more » « less
Full Text Available
MERLIN++: parameter-free discovery of time series anomalies

https://doi.org/10.1007/s10618-022-00876-7

Nakamura, Takaaki; Mercer, Ryan; Imamura, Makoto; Keogh, Eamonn (March 2023, Data Mining and Knowledge Discovery)

The burgeoning age of IoT has reinforced the need for robust time series anomaly detection. While there are hundreds of anomaly detection methods in the literature, one definition, time series discords, has emerged as a competitive and popular choice for practitioners. Time series discords are subsequences of a time series that are maximally far away from their nearest neighbors. Perhaps the most attractive feature of discords is their simplicity. Unlike many of the parameter-laden methods proposed, discords require only a single parameter to be set by the user: the subsequence length. We believe that the utility of discords is reduced by sensitivity to even this single user choice. The obvious solution to this problem, computing discords of all lengths then selecting the best anomalies (under some measure), appears at first glance to be computationally untenable. However, in this work we discuss MERLIN, a recently introduced algorithm that can efficiently and exactly find discords of all lengths in massive time series archives. By exploiting computational redundancies, MERLIN is two orders of magnitude faster than comparable algorithms. Moreover, we show that by exploiting a little-known indexing technique called Orchard’s algorithm, we can create a new algorithm called MERLIN++, which is an order of magnitude faster than MERLIN, yet produces identical results. We demonstrate the utility of our ideas on a large and diverse set of experiments and show that MERLIN++ can discover subtle anomalies that defy existing algorithms or even careful human inspection. We further compare to five state-of-the-art rival methods, on the largest benchmark dataset for this task, and show that MERLIN++ is superior in terms of accuracy and speed.
more » « less
Full Text Available
Sketching Multidimensional Time Series for Fast Discord Mining

Yeh, Chin-Chia Michael; Zheng, Yan; Pan, Menghai; Chen, Huiyuan; Zhuang, Zhongfang; Wang, Junpeng; Wang, Liang; Zhang, Wei; Phillips, Jeff M.; Keogh, Eamonn (December 2023, IEEE International Conference on Big Data)

Full Text Available
DAMP: accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams

https://doi.org/10.1007/s10618-022-00911-7

Lu, Yue; Wu, Renjie; Mueen, Abdullah; Zuluaga, Maria A.; Keogh, Eamonn (March 2023, Data Mining and Knowledge Discovery)

Full Text Available
Matrix Profile XXIV: Scaling Time Series Anomaly Detection to Trillions of Datapoints and Ultra-fast Arriving Data Streams

https://doi.org/10.1145/3534678.3539271

Lu, Yue; Wu, Renjie; Mueen, Abdullah; Zuluaga, Maria A.; Keogh, Eamonn (August 2022, 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Full Text Available
Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress

https://doi.org/10.1109/TKDE.2021.3112126

Wu, Renjie; Keogh, Eamonn (January 2021, IEEE Transactions on Knowledge and Data Engineering)

Time series anomaly detection has been a perennially important topic in data science, with papers dating back to the 1950s. However, in recent years there has been an explosion of interest in this topic, much of it driven by the success of deep learning in other domains and for other time series tasks. Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim. The majority of the individual exemplars in these datasets suffer from one or more of four flaws. Because of these four flaws, we believe that many published comparisons of anomaly detection algorithms may be unreliable, and more importantly, much of the apparent progress in recent years may be illusionary. In addition to demonstrating these claims, with this paper we introduce the UCR Time Series Anomaly Archive. We believe that this resource will perform a similar role as the UCR Time Series Classification Archive, by providing the community with a benchmark that allows meaningful comparisons between approaches and a meaningful gauge of overall progress
more » « less
Full Text Available
Matrix Profile Index Approximation for Streaming Time Series

https://doi.org/10.1109/BigData52589.2021.9671484

Shahcheraghi, Maryam; Cappon, Trevor; Oymak, Samet; Papalexakis, Evangelos; Keogh, Eamonn; Zimmerman, Zachary; Brisk, Philip (December 2021, IEEE International Conference on Big Data)

Full Text Available
When is Early Classification of Time Series Meaningful

https://doi.org/10.1109/TKDE.2021.3108580

Wu, Renjie; Der, Audrey; Keogh, Eamonn (January 2021, IEEE Transactions on Knowledge and Data Engineering)

Since its introduction two decades ago, there has been increasing interest in the problem of early classification of time series . This problem generalizes classic time series classification to ask if we can classify a time series subsequence with sufficient accuracy and confidence after seeing only some prefix of a target pattern. The idea is that the earlier classification would allow us to take immediate action, in a domain in which some practical interventions are possible. For example, that intervention might be sounding an alarm or applying the brakes in an automobile. In this work, we make a surprising claim. In spite of the fact that there are dozens of papers on early classification of time series, it is not clear that any of them could ever work in a real-world setting. The problem is not with the algorithms per se but with the vague and underspecified problem description. Essentially all algorithms make implicit and unwarranted assumptions about the problem that will ensure that they will be plagued by false positives and false negatives even if their results suggested that they could obtain near-perfect results. We will explain our findings with novel insights and experiments and offer recommendations to the community
more » « less
Full Text Available
MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives

https://doi.org/10.1109/ICDM50108.2020.00147

Nakamura, Takaaki; Imamura, Makoto; Mercer, Ryan; Keogh, Eamonn (November 2020, ICDM 2020)
null (Ed.)
Time series anomaly detection remains a perennially important research topic. If anything, it is a task that has become increasingly important in the burgeoning age of IoT. While there are hundreds of anomaly detection methods in the literature, one definition, time series discords, has emerged as a competitive and popular choice for practitioners. Time series discords are subsequences of a time series that are maximally far away from their nearest neighbors. Perhaps the most attractive feature of discords is their simplicity. Unlike many parameter laden methods, discords require only a single parameter to be set by the user: the subsequence length. In this work we argue that the utility of discords is reduced by sensitivity to this single user choice. The obvious solution to this problem, computing discords of all lengths then selecting the best anomalies (under some measure), seems to be computationally untenable. However, in this work we introduce MERLIN, an algorithm that can efficiently and exactly find discords of all lengths in massive time series archives.
more » « less
Full Text Available
A Computational System to Support Fully Automated Mark-Recapture Studies of Ants

https://doi.org/10.1109/CSCI49370.2019.00285

Yoon, Carey; Madrid, Frank; West, Mari; Keogh, Eamonn (December 2019, 2019 International Conference on Computational Science and Computational Intelligence (CSCI))

Full Text Available

« Prev Next »

Search for: All records